|
Design and application of middleware for Web full-text retrieval
Wei-gang ZHANG Yong-dong XU Xiao-qiang LEI Hui HE
Journal of Computer Applications
2011, 31 (08):
2261-2264.
DOI: 10.3724/SP.J.1087.2011.02261
To provide better Web search services, the key techniques of the full-text retrieval were studied and a middleware was designed and implemented. By using a multi-thread website crawler program, the Web pages of the given URLs were collected. Bloom-Filter algorithm was employed to get rid of large-scale duplicate URLs in the collected Web pages. A new content extraction approach based on the Web tags was presented to extract the full-text content of Web pages for indexing and searching. The experimental results verify the efficiency of the content extraction method. Furthermore, to improve the search experience of users, many personalized search assistances were provided by this middleware. Boso, a blog search engine, was developed to test and verify the presented middleware. The results show that the presented middleware can be applied to actual search engines.
Reference |
Related Articles |
Metrics
|
|